Gradient Descent#
Problem: minimize f(x), \(x \in \mathbb{R}^d\)
Idea: the gradient of a function
at a point gives the direction of the steepest ascent of the function at that point. So, if we move in the opposite direction of the gradient, we should be moving towards a local minimum.
where \(\alpha\) is the step size or learning rate. \(x_i\) is the current point and \(x_{i+1}\) is the next point. \(x_0\) is the initial point.
We repeat this process until \(|f'(x_i)|\) is close to 0 for some \(i\).
Note:
This algorithm tends to go to the closest local minimum.
If \(\nabla f(x_i)=0\), then we are at stationary point so we won’t move.
In the simplest form \(\alpha\) is fixed. For more advanced versions, it is usually adaptive or randomized.
Interactive Visualization#
Example#
Use gradient descent to solve linear regression problem.
For a single-variable linear regression model \(f(x) = wx + b\) and a dataset \(\{ (x_i, y_i) \}\), the Mean Squared Error (MSE) loss function is defined as:
where \(N\) is the number of data points.
Derivatives of the Loss Function#
The gradients of the loss function with respect to \(w\) and \(b\) are:
Using gradient descent, the parameters \(w\) and \(b\) are updated as:
where \(\alpha\) is the learning rate.
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML, display
# Step 1: Generate synthetic data
np.random.seed(42)
x = 2 * np.random.rand(100, 1)
y = 4 + 3 * x + np.random.randn(100, 1)
# for plotting the straight line
gridx = np.linspace(0, 2, 100).reshape(-1, 1)
# Step 2: Define the loss function
def compute_loss(y_true, y_pred):
return np.mean((y_true - y_pred) ** 2)
# Create a grid of w and b values to visualize the loss landscape
w_values = np.linspace(0, 8, 200)
b_values = np.linspace(0, 8, 200)
w_grid, b_grid = np.meshgrid(w_values, b_values)
loss_grid = np.zeros_like(w_grid)
# Calculate the loss for each combination of w and b
for i in range(w_grid.shape[0]):
for j in range(w_grid.shape[1]):
w, b = w_grid[i, j], b_grid[i, j]
y_pred = w * x + b
loss_grid[i, j] = compute_loss(y, y_pred)
# Step 3: Implement gradient descent on w and b
learning_rate = 0.3
n_iterations = 50
w = 2
b = 1
# Store values for plotting the trajectory
trajectory = [(w, b)]
y_preds = [w * gridx + b]
# Precompute gradients and predictions
ws = [w]
bs = [b]
for iteration in range(n_iterations):
# Predictions
y_pred = ws[-1] * x + bs[-1]
# Compute gradients
w_gradient = -2 * np.mean(x * (y - y_pred))
b_gradient = -2 * np.mean(y - y_pred)
# Update parameters
ws.append(ws[-1] - learning_rate * w_gradient)
bs.append(bs[-1] - learning_rate * b_gradient)
# Save the trajectory and predictions
trajectory.append((ws[-1], bs[-1]))
y_preds.append(ws[-1] * gridx + bs[-1])
# Prepare the figure
fig, axs = plt.subplots(1, 2, figsize=(12, 6))
# Left subplot: Loss landscape and trajectory
levels = [0, 0.1, 1, 2, 5, 10, 15, 20, 30, 50, 75, 100]
cp = axs[0].contour(w_grid, b_grid, loss_grid, levels=levels, cmap='viridis')
axs[0].clabel(cp, inline=True, fontsize=10)
traj_plot, = axs[0].plot([], [], 'r-o', label='Gradient Descent Path')
axs[0].set_xlabel('w')
axs[0].set_ylabel('b')
axs[0].set_title('Trajectory of Gradient Descent in the Loss Landscape')
axs[0].legend()
# Right subplot: Current line over noisy data
scatter = axs[1].scatter(x, y, color='blue', label='Noisy Data')
line, = axs[1].plot([], [], 'r-', label='Current Fit')
axs[1].set_xlabel('x')
axs[1].set_ylabel('y')
axs[1].set_title('Current Fit of the Model')
axs[1].legend()
def animate(i):
# Update trajectory plot
traj_plot.set_data(*zip(*trajectory[:i+1]))
# Update line plot
y_pred = y_preds[i]
line.set_data(gridx.flatten(), y_pred.flatten())
return traj_plot, line
ani = animation.FuncAnimation(fig, animate, frames=n_iterations, interval=200, blit=True)
# Display the animation in the notebook
html_output = HTML(ani.to_jshtml())
display(html_output)